Search CORE

52 research outputs found

A Tight Excess Risk Bound via a Unified PAC-Bayesian-Rademacher-Shtarkov-MDL Complexity

Author: Grünwald Peter D.
Mehta Nishant A.
Publication venue
Publication date: 20/10/2017
Field of study

We present a novel notion of complexity that interpolates between and generalizes some classic existing complexity notions in learning theory: for estimators like empirical risk minimization (ERM) with arbitrary bounded losses, it is upper bounded in terms of data-independent Rademacher complexity; for generalized Bayesian estimators, it is upper bounded by the data-dependent information complexity (also known as stochastic or PAC-Bayesian,

\mathrm{KL}(\text{posterior} \operatorname{\|} \text{prior})

complexity. For (penalized) ERM, the new complexity reduces to (generalized) normalized maximum likelihood (NML) complexity, i.e. a minimax log-loss individual-sequence regret. Our first main result bounds excess risk in terms of the new complexity. Our second main result links the new complexity via Rademacher complexity to

L_2(P)

entropy, thereby generalizing earlier results of Opper, Haussler, Lugosi, and Cesa-Bianchi who did the log-loss case with

L_\infty

. Together, these results recover optimal bounds for VC- and large (polynomial entropy) classes, replacing localized Rademacher complexity by a simpler analysis which almost completely separates the two aspects that determine the achievable rates: 'easiness' (Bernstein) conditions and model complexity.Comment: 38 page

arXiv.org e-Print Archive

CWI's Institutional Repository

Mathematics Is Physics

Author: E. Landry
LEWIS CARROLL
Peter D. Grünwald
Réka Albert
Stefan Banach
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/08/2015
Field of study

In this essay, I argue that mathematics is a natural science---just like physics, chemistry, or biology---and that this can explain the alleged "unreasonable" effectiveness of mathematics in the physical sciences. The main challenge for this view is to explain how mathematical theories can become increasingly abstract and develop their own internal structure, whilst still maintaining an appropriate empirical tether that can explain their later use in physics. In order to address this, I offer a theory of mathematical theory-building based on the idea that human knowledge has the structure of a scale-free network and that abstract mathematical theories arise from a repeated process of replacing strong analogies with new hubs in this network. This allows mathematics to be seen as the study of regularities, within regularities, within ..., within regularities of the natural world. Since mathematical theories are derived from the natural world, albeit at a much higher level of abstraction than most other scientific theories, it should come as no surprise that they so often show up in physics. This version of the essay contains an addendum responding to Slyvia Wenmackers' essay and comments that were made on the FQXi website.Comment: 15 pages, LaTeX. Second prize winner in 2015 FQXi Essay Contest (see http://fqxi.org/community/forum/topic/2364

arXiv.org e-Print Archive

Crossref

Chapman University Digital Commons

Fast rates in statistical and online learning

Author: Grünwald Peter D.
Mehta Nishant A.
Reid Mark D.
van Erven Tim
Williamson Robert C.
Publication venue
Publication date: 01/01/2015
Field of study

The speed with which a learning algorithm converges as it is presented with more data is a central problem in machine learning --- a fast rate of convergence means less data is needed for the same level of performance. The pursuit of fast rates in online and statistical learning has led to the discovery of many conditions in learning theory under which fast learning is possible. We show that most of these conditions are special cases of a single, unifying condition, that comes in two forms: the central condition for 'proper' learning algorithms that always output a hypothesis in the given model, and stochastic mixability for online algorithms that may make predictions outside of the model. We show that under surprisingly weak assumptions both conditions are, in a certain sense, equivalent. The central condition has a re-interpretation in terms of convexity of a set of pseudoprobabilities, linking it to density estimation under misspecification. For bounded losses, we show how the central condition enables a direct proof of fast rates and we prove its equivalence to the Bernstein condition, itself a generalization of the Tsybakov margin condition, both of which have played a central role in obtaining fast rates in statistical learning. Yet, while the Bernstein condition is two-sided, the central condition is one-sided, making it more suitable to deal with unbounded losses. In its stochastic mixability form, our condition generalizes both a stochastic exp-concavity condition identified by Juditsky, Rigollet and Tsybakov and Vovk's notion of mixability. Our unifying conditions thus provide a substantial step towards a characterization of fast rates in statistical learning, similar to how classical mixability characterizes constant regret in the sequential prediction with expert advice setting.Comment: 69 pages, 3 figure

arXiv.org e-Print Archive

CWI's Institutional Repository

Leiden University Scholary Publications

The No-Free-Lunch Theorems of Supervised Learning

Author: Grünwald Peter D.
Sterkenburg Tom F.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

The no-free-lunch theorems promote a skeptical conclusion that all possible machine learning algorithms equally lack justification. But how could this leave room for a learning theory, that shows that some algorithms are better than others? Drawing parallels to the philosophy of induction, we point out that the no-free-lunch results presuppose a conception of learning algorithms as purely data-driven. On this conception, every algorithm must have an inherent inductive bias, that wants justification. We argue that many standard learning algorithms should rather be understood as model-dependent: in each application they also require for input a model, representing a bias. Generic algorithms themselves, they can be given a model-relative justification

arXiv.org e-Print Archive

PhilSci Archive

Suboptimal behavior of Bayes and MDL in classification under misspecification

Author: A. Blumer
A. R. Barron
A. R. Barron
B. Clarke
C. S. Wallace
C. S. Wallace
C. S. Wallace
D. Blackwell
D. Heckerman
J. Quinlan
J. Rissanen
John Langford
K. Yamanishi
M. E. Tipping
M. Kearns
O. Bunke
P. D. Grünwald
P. Diaconis
Peter Grünwald
R. Meir
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

The FLUXNET2015 dataset and the ONEFlux processing pipeline for eddy covariance data

Author: Agarwal Deb
Amiro Brian
Ammann Christof
Arain M. Altaf
Ardö Jonas
Arkebauer Timothy
Arndt Stefan K.
Arriga Nicola
Aubinet Marc
Aurela Mika
Baldocchi Dennis
Barr Alan
Beamesderfer Eric
Bergeron Onil
Beringer Jason
Bernhofer Christian
Berveiller Daniel
Billesbach Dave
Biraud Sebastien
Black Thomas Andrew
Blanken Peter D.
Bohrer Gil
Boike Julia
Bolstad Paul V.
Bonal Damien
Bonnefond Jean Marc
Bowling David R.
Bracho Rosvel
Brodeur Jason
Brümmer Christian
Buchmann Nina
Burban Benoit
Burns Sean P.
Buysse Pauline
Cale Peter
Canfora Eleonora
Cavagna Mauro
Cellier Pierre
Cheah You Wei
Chen Jiquan
Chen Shiping
Chini Isaac
Christensen Torben R.
Christianson Danielle
Chu Housen
Cleverly James
Collalti Alessio
Consalvo Claudia
Cook Bruce D.
Cook David
Coursolle Carole
Cremonese Edoardo
Curtis Peter S.
D'Andrea Ettore
da Rocha Humberto
Dai Xiaoqin
Davis Kenneth J.
De Cinti Bruno
de Dios Victor Resco
de Grandcourt Agnes
De Ligne Anne
De Oliveira Raimundo C.
Delpierre Nicolas
Desai Ankur R.
Di Bella Carlos Marcelo
di Tommasi Paul
Dolman Han
Domingo Francisco
Dong Gang
Dore Sabina
Duce Pierpaolo
Dufrêne Eric
Dunn Allison
Dušek Jiří
Eamus Derek
Eichelmann Uwe
Elbashandy Abdelrahman
ElKhidir Hatim Abdalla M.
Eugster Werner
Ewenz Cacilia M.
Ewers Brent
Famulari Daniela
Fares Silvano
Feigenwinter Iris
Feitz Andrew
Fensholt Rasmus
Filippa Gianluca
Fischer Marc
Frank John
Galvagno Marta
Gharun Mana
Gianelle Damiano
Gielen Bert
Gioli Beniamino
Gitelson Anatoly
Goded Ignacio
Goeckede Mathias
Goldstein Allen H.
Gough Christopher M.
Goulden Michael L.
Graf Alexander
Griebel Anne
Gruening Carsten
Grünwald Thomas
Hammerle Albin
Han Shijie
Han Xingguo
Hansen Birger Ulf
Hanson Chad
Hatakka Juha
He Yongtao
Hehn Markus
Heinesch Bernard
Hinko-Najera Nina
Humphrey Marty
Hutley Lindsay
Hörtnagl Lukas
Ibrom Andreas
Ikawa Hiroki
Isaac Peter
Jackowicz-Korczynski Marcin
Janouš Dalibor
Jans Wilma
Jassal Rachhpal
Jiang Shicheng
Kato Tomomichi
Khomik Myroslava
Klatt Janina
Knohl Alexander
Knox Sara
Kobayashi Hideki
Koerber Georgia
Kolle Olaf
Kosugi Yoshiko
Kotani Ayumi
Kowalski Andrew
Kruijt Bart
Kurbatova Julia
Kutsch Werner L.
Kwon Hyojung
Launiainen Samuli
Laurila Tuomas
Law Bev
Leuning Ray
Li Yingnian
Li Yuelin
Liddell Michael
Limousin Jean Marc
Lion Marryanna
Liska Adam J.
Lohila Annalea
Loubet Benjamin
Loustau Denis
Lucas-Moffat Antje
López-Ballesteros Ana
López-Blanco Efrén
Lüers Johannes
Ma Siyan
Macfarlane Craig
Magliulo Vincenzo
Maier Regine
Mammarella Ivan
Manca Giovanni
Marchesini Luca Belelli
Marcolla Barbara
Margolis Hank A.
Marras Serena
Massman William
Mastepanov Mikhail
Matamala Roser
Matthes Jaclyn Hatala
Mazzenga Francesco
McCaughey Harry
McHugh Ian
McMillan Andrew M.S.
Merbold Lutz
Meyer Wayne
Meyers Tilden
Miller Scott D.
Minerbi Stefano
Moderow Uta
Monson Russell K.
Montagnani Leonardo
Moore Caitlin E.
Moors Eddy
Moreaux Virginie
Moureaux Christine
Munger J. William
Nakai Taro
Neirynck Johan
Nesic Zoran
Nicolini Giacomo
Noormets Asko
Northwood Matthew
Nosetto Marcelo
Nouvellon Yann
Novick Kimberly
Oechel Walter
Olesen Jørgen Eivind
Ourcival Jean Marc
Papale Dario
Papuga Shirley A.
Parmentier Frans Jan
Pastorello Gilberto
Paul-Limoges Eugenie
Pavelka Marian
Peichl Matthias
Pendall Elise
Phillips Richard P.
Pilegaard Kim
Pirk Norbert
Poindexter Cristina
Polidori Diego
Posse Gabriela
Powell Thomas
Prasse Heiko
Prober Suzanne M.
Rambal Serge
Rannik Üllar
Raz-Yaseef Naama
Rebmann Corinna
Reed David
Reichstein Marcus
Restrepo-Coupe Natalia
Reverter Borja R.
Ribeca Alessio
Roland Marilyn
Sabbatini Simone
Sachs Torsten
Saleska Scott R.
Sanchez-Mejia Zulia M.
Schmid Hans Peter
Schmidt Marius
Schneider Karl
Schrader Frederik
Schroder Ivan
Scott Russell L.
Sedlák Pavel
Serrano-Ortíz Penélope
Shao Changliang
Shi Peili
Shironya Ivan
Siebicke Lukas
Silberstein Richard
Sirca Costantino
Spano Donatella
Steinbrecher Rainer
Stevens Robert M.
Sturtevant Cove
Suyker Andy
Sánchez-Cañete Enrique P.
Tagesson Torbern
Takanashi Satoru
Tang Yanhong
Tapper Nigel
Thom Jonathan
Tomassucci Michele
Torn Margaret
Trotta Carlo
Tuovinen Juha Pekka
Urbanski Shawn
Valentini Riccardo
van der Molen Michiel
van Gorsel Eva
van Huissteden Ko
van Ingen Catharine
Varlagin Andrej
Verfaillie Joseph
Vesala Timo
Vincke Caroline
Vitale Domenico
Vuichard Nicolas
Vygodskaya Natalia
Walker Jeffrey P.
Walter-Shea Elizabeth
Wang Huimin
Weber Robin
Westermann Sebastian
Wille Christian
Wofsy Steven
Wohlfahrt Georg
Wolf Sebastian
Woodgate William
Zampedri Roberto
Zhang Junhui
Zhang Leiming
Zhou Guoyi
Zona Donatella
Šigut Ladislav
Publication venue
Publication date: 01/01/2020
Field of study

The FLUXNET2015 dataset provides ecosystem-scale data on CO2, water, and energy exchange between the biosphere and the atmosphere, and other meteorological and biological measurements, from 212 sites around the globe (over 1500 site-years, up to and including year 2014). These sites, independently managed and operated, voluntarily contributed their data to create global datasets. Data were quality controlled and processed using uniform methods, to improve consistency and intercomparability across sites. The dataset is already being used in a number of applications, including ecophysiology studies, remote sensing studies, and development of ecosystem and Earth system models. FLUXNET2015 includes derived-data products, such as gap-filled time series, ecosystem respiration and photosynthetic uptake estimates, estimation of uncertainties, and metadata about the measurements, presented for the first time in this paper. In addition, 206 of these sites are for the first time distributed under a Creative Commons (CC-BY 4.0) license. This paper details this enhanced dataset and the processing methods, now made available as open-source codes, making the dataset more accessible, transparent, and reproducible.Peer reviewe

Epsilon Open Archive

VU Research Portal

Jukuri

ResearchOnline at James Cook University

Repositorio Institucional Universidad de Granada

Electronic Publication Information Center

Repositori Obert UdL

Open Repository and Bibliography - Liège

of Botany,Chinese Academy Of Sciences

NORA - Norwegian Open Research Archives

White Rose Research Online

Online Research Database In Technology